# Documentation Strategy — The Problem Is Entry Points, Not Quantity

You have **414 markdown files** in `doc/`. That's not a documentation shortage — it's the opposite. The problem is that someone new to the product, a partner integration team, a customer architecture reviewer, or a new operator has nowhere to start. They land on a 164-doc architecture folder and bounce.

Look at the surface:

| Folder              | Files | Purpose                          |
| ------------------- | ----- | -------------------------------- |
| `doc/architecture/` | 164   | Specs / ADRs / detailed designs  |
| `doc/operations/`   | 110   | Runbooks, evidence, profiles     |
| `doc/product/`      | 93    | UX, IA, journeys, gaps           |
| `doc/governance/`   | 47    | Coding/testing standards, queues |
| `doc/api/`          | 3     | OpenAPI/AsyncAPI artifacts       |

All of these are **reference material** — canonical decisions that engineers reach into when they need authority. None of them are **narrative entry points** that explain "what is GPUaaS, how does it work, what's the architecture, how do I deploy it, how do I run it." The 6 top-level files (`README.md`, `AGENTS.md`, `CLAUDE.md`, etc.) are agent / contributor instructions, not product documentation.

This is the classic *"plenty of trees, no map of the forest"* pattern. You don't need fewer trees. You need a map.

## The fix: a small set of entry points, layered by audience

Don't reorganize the 414 existing docs. They're working as canonical reference. **Add 6–8 entry docs that link into them**, organized by audience and purpose. Same composition discipline you apply everywhere else: each entry doc has one purpose, references existing specs as authority, never duplicates.

```
doc/
├── README.md                       ← rebrand current root README into this
├── 01_Product.md                   ← "What is GPUaaS, who is it for, why does it exist"
├── 02_Architecture_Overview.md     ← one-page architecture, links to specs
├── 03_Security_Model.md            ← trust boundaries, identity, isolation
├── 04_Deployment_Profiles.md       ← 5 edge profiles in one place
├── 05_Operations_Map.md            ← failure modes → runbook index
├── 06_Glossary.md                  ← every term defined once
├── for_app_developers/             ← App SDK getting-started
│   ├── README.md
│   ├── Quick_Start.md
│   └── ...
├── for_operators/                  ← Existing operations docs, but with an index
│   └── README.md                   ← curated map of doc/operations/
├── for_engineers/                  ← Onboarding for new hires
│   ├── README.md
│   └── Codebase_Tour.md
├── architecture/                   ← UNCHANGED (164 files)
├── product/                        ← UNCHANGED (93 files)
├── governance/                     ← UNCHANGED (47 files)
├── operations/                     ← UNCHANGED (110 files)
└── api/                            ← UNCHANGED (3 files)
```

The existing `doc/architecture/*`, `doc/operations/*`, etc. stay where they are. They are the **specs**. The new top-level docs are the **narrative bridges**. Same shape as the App SDK contract: a small closed set of entry points, an open expanding set of detailed specs underneath.

## What each entry doc does

| Doc                           | Audience                                    | Length              | Purpose                                                      |
| ----------------------------- | ------------------------------------------- | ------------------- | ------------------------------------------------------------ |
| `01_Product.md`               | Everyone (sales, customers, new hires)      | 2 pages             | What GPUaaS is, who pays for it, what differentiates it from competitors |
| `02_Architecture_Overview.md` | Engineers, integrators, customer architects | 3-5 pages + diagram | The five-layer model (App SDK → route intent → tenant isolation → data plane → edge profile). Each section links to canonical specs |
| `03_Security_Model.md`        | Customers, security reviewers, auditors     | 3-5 pages           | Trust boundaries, identity model, BlueField/host split, mTLS topology, audit model. Links to PKI spec, App SDK §3, Tenant Isolation doc |
| `04_Deployment_Profiles.md`   | Customers, partners, ops                    | 2-3 pages           | The 5 edge profiles (`kind_cloudflare` / `kind_local_dns` / `prod_public_ingress` / `prod_private_ingress` / `airgapped_private_ca`) in one place |
| `05_Operations_Map.md`        | Operators, on-call                          | 2-3 pages           | Failure-mode index → runbook references. The "where do I look when X is broken" map |
| `06_Glossary.md`              | Everyone                                    | 2-3 pages           | Every term: allocation, slice, app instance, scheduler app, building block, proxy pool, route family, etc. Define once, reference everywhere |
| `for_app_developers/`         | External app developers                     | Multi-doc           | Onboarding for adding apps via the SDK; manifest examples; auth pattern decision tree |
| `for_engineers/`              | New hires, cross-team                       | Multi-doc           | Code structure tour, "find the X by reading Y" map, common workflows |
| `for_operators/`              | Internal SRE                                | Multi-doc           | Curated index of existing runbooks; observability map        |

**Total new content: ~6-8 entry docs (~15-20 pages) + 3 audience-specific subdir READMEs.** Not 414. Not 50. Eight focused docs that act as the front door to the existing material.

## The discipline rules (same as everywhere else)

1. **Entry docs link, don't duplicate.** When `02_Architecture_Overview.md` describes the proxy layer, it links to `Platform_Proxy_OSS_Data_Plane_ADR_v1.md` for authority. If the ADR changes, the entry doc doesn't need updating — only its summary.
2. **Specs are authoritative; entry docs are narrative.** If the entry doc says X and the spec says Y, the spec wins. Reviewers fix the entry doc.
3. **Audience-tagged.** Each entry doc declares who it's for in the first line. Customers don't read the engineer onboarding; engineers don't read sales material.
4. **Length budget.** Entry docs have hard length budgets (2-5 pages). When they exceed, that's a signal to split or push detail into a spec.
5. **Same versioning discipline.** Major architectural changes (a new edge profile, a new endpoint type, a new building block) update the relevant entry doc *and* the spec, in the same PR. The spec is the contract; the entry doc is the explanation.
6. **Maintain via cross-references.** Every spec doc starts with "Related: <entry doc>" pointing back. Lets reviewers see at-a-glance which entry doc to update when they touch a spec.

## The first 4 docs to write (highest leverage)

If I were starting this today, I'd write these four first because they unlock the rest:

### 1. `02_Architecture_Overview.md` — the missing front door

The five-layer narrative I keep drawing for you:

```
App contract → Route intent → Tenant isolation → Data plane → Edge profile
```

Plus the BlueField / host trust split, plus the major workers, plus the App SDK boundary. A single page with a diagram and 3-4 paragraphs per layer, each linking to its canonical spec. Probably the single highest-leverage doc to write — half of your existing docs become *immediately more legible* once this exists as the entry point.

### 2. `03_Security_Model.md` — the customer/auditor doc

Trust boundaries (platform vs host vs BlueField vs user vs tenant); identity model (User CA, Host CA, tier-1 OIDC, tier-2 native auth); audit model; isolation guarantees. Customers in regulated industries will demand this; sales engineering will want it; security review boards will ask for it. Currently scattered across 8+ specs.

### 3. `06_Glossary.md` — the slow burn that compounds

GPUaaS has invented or specialized many terms: allocation, slice, app instance, scheduler app, building block, proxy pool, route family, managed_ingress, client auth mode, route intent, target binding, edge profile, drift, etc. Define once. Every other doc references. This is the lowest-effort, highest-multiplier doc in the set.

### 4. `for_app_developers/Quick_Start.md` — the external SDK doc

Right now an external app developer would have to read App_SDK_Design_Principles + Launchable_OCI_Workload_Profile_Contract + manifest examples + 5 other docs to ship an app. A 1-2 page quick-start ("here's a 30-line manifest, here's what it produces, here's how to test it locally") makes the App SDK genuinely consumable for external developers. The composition strategy depends on this being easy.

Probably **1 week of focused writing** at your team's pace to do these four. Lower-leverage entry docs (operator map, deployment profiles narrative, engineer onboarding) can follow as they're needed.

## What to NOT do

- **Don't reorganize the 414 existing docs.** They're a working canonical reference. Moving them invalidates every cross-link. The composition discipline says: entry docs reach into them, don't replace them.
- **Don't try to write all entry docs at once.** Four is enough to socialize the core architecture. The rest can be lazy-built when an audience asks for them.
- **Don't make the entry docs the source of truth.** They're narrative summaries. Specs remain authoritative. When in doubt, link to the spec.
- **Don't auto-generate from specs.** Tried-and-failed pattern. Auto-generated docs end up as bad as the source. Hand-written narrative is the value.
- **Don't write for the team only.** The framing failure right now is that all the docs are written *by* engineers *for* engineers. Customers, partners, and pre-sales need different docs than the team does. The audience-tagged subdirs are the discipline.

## Maintenance pattern at your velocity

Sustainable for a small team:

1. **Spec PRs cite the entry doc to update.** "This changes the route intent shape; update `02_Architecture_Overview.md §3` accordingly." A line in the PR template enforces it.
2. **Quarterly entry-doc review.** ~1 hour every quarter, walk through the 6-8 entry docs, fix drift. The compounding cost stays low because length is bounded.
3. **Customer-question feedback loop.** Every time pre-sales or a customer asks "I couldn't find X in your docs," that's a signal — fix the relevant entry doc rather than answer the question manually.
4. **Track "did the new hire find this in <30 min?"** as a soft KPI for entry-doc quality.

## One framing to land with the team

Same principle that runs your composition strategy: **entry points are a closed enum, specs are an open set**. You've been building specs (414 of them) without entry points. That's why your docs feel hard to socialize even though they're high-quality. Add the entry points; existing specs immediately become 10× more useful.

The doc set has the same shape problem your App SDK was solving for: lots of careful canonical decisions, no friendly surface for the people who consume them. Same fix: a small, opinionated, maintained set of entry points that delegate to specs for authority.

Write the four entry docs above; the rest follows naturally.